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Abstract. Semantic publishing offers the promise of computable pa- 
pers, enriched visualisation and a realisation of the linked data ideal. In 
reality, however, the publication process contrives to prevent richer se- 
mantics while culminating in a 'lumpen' PDF. In this paper, we discuss 
a web-first approach to publication, and describe a three-tiered approach 
which integrates with the existing authoring tooling. Critically, although 
it adds limited semantics, it does provide value to all the participants in 
the process: the author, the reader and the machine. 
License: This work is licensed under a Creative Commons Attribution 
3.0 Unported License. http://creativecommons.Org/licenses/by/3.0/. It 
is also available at http://www.russet.org.uk/blog/2012/04/three-steps- 
to- heaven/ 



1 Introduction 

The publishing of both data and narratives on those data are changing radically. 
Linked Open Data and related semantic technologies allow for semantic publish- 
ing of data. We still need, however, to publish the narratives on that data and 
that style of publishing is in the process of change; one of those changes is the in- 
corporation of semantics }l|2|3j . The idea of semantic publishing is an attractive 
one for those who wish to consume papers electronically; it should enhance the 
richness of the computational component of papers [2] . It promises a realisation 
of the vision of a next generation of the web, with papers becoming a critical 
part of a linked data environment |H4j , where the results and naratives become 
one. 

The reality, however, is somewhat different. There are significant barriers to 
the acceptance of semantic publishing as a standard mechanism for academic 
publishing. The web was invented around 1990 as a light-weight mechanism for 
publication of documents. It has subsequently had a massive impact on society 
in general. It has, however, barely touched most scientific publishing; while most 
journals have a website, the publication process still revolves around the gener- 
ation of papers, moving from Microsoft Word or DTj^X through to a final 
PDF which looks, feels and is something designed to be printed onto papeiQ 

4 This includes conferences dedicated to the web and the use of web technologies. 



Adding semantics into this environment is difficult or impossible; the content 
of the PDF has to be exposed and semantic content retro-fitted or, in all likeli- 
hood, a complex process of author and publisher interaction has to be devised 
and followed. If semantic data publishing and semantic publishing of academic 
narratives are to work together, then academic publishing needs to change. 

In this paper, we describe our attempts to take a commodity publication 
environment, and modify it to bring in some of the formality required from 
academic publishing. We illustrate this with three exemplars — different kinds 
of knowledge that we wish to enhance. In the process, we add a small amount 
of semantics to the finished articles. Our key constraint is the desire to add 
value for all the human participants. Both authors and readers should see and 
recognise additional value, with the semantics a useful or necessary byproduct 
of the process, rather than the primary motivation. We characterise this process 
as our "three steps to heaven" , namely: 

— make life better for the machine to 

— make life better for the author to 

— make life better for the reader 

While requiring additional value for all of these participants is hard, and 
places significant limitations on the level of semantics that can be achieved, we 
believe that it does increase the likelihood that content will be generated in 
the first place, and represents an attempt to enable semantic publishing in a 
real- world workflow. 

2 Knowledgeblog 

The knowledgeblog project stemmed from the desire for a book describing the 
many aspects of ontology development, from the underlying formal semantics, to 
the practical technology layer and, finally, through to the knowledge domain [5J. 
However, we have found the traditional book publishing process frustrating and 
unrewarding. While scientific authoring is difficult in its own right, our own 
experience suggests that the publishing process is extremely hard-work. This is 
particularly so for multi-author collected works which are often harder for the 
editor than writing a book "solo" . Finally, the expense and hard copy nature of 
academic books means that, again in our experience, few people read them. 

This contrasts starkly with the web-first publication process that has become 
known as blogging. With any of a number of ready made platforms, it is possible 
for authors with little or no technical skill, to publish content to the web with 
ease. For knowledgeblog ("kblog"), we have taken one blogging engine, Word- 
Press [7], running on low-end hardware, and used it to develop a multi-author 
resource describing the use of ontologies in the life sciences (our main field of 
expertise) . There are also kblogs on bioinformatics^] and the Taverna workflow 
environment [8] . We have previously described how we addressed some of the 
social aspects, including attribution, reviewing and immutablity of articles[BJ. 

5 |http : / /bioinf ormat ics . knowl edgeblog . org| 
http : / /taverna . knowledgeblog . org 



As well as delivering content, we are also using this framework to investigate 
semantic academic publishing, investigating how we can enhance the machine in- 
terpretability of the final paper, while living within the key constraint of making 
life (slightly) better for machine, author and reader without adding complexity 
for the human participants. 

Scientific authors are relatively conservative. Most of them have well-established 
toolsets and workflows which they are relatively unwilling to change. For in- 
stance, within the kblog project, we have used workshops to start the process of 
content generation. For our initial meeting, we gave little guidance on authoring 
process to authors, as a result of which most attempted to use WordPress di- 
rectly for authoring. The WordPress editing environment is, however, web-based, 
and was originally designed for editing short, non-technical articles. It appeared 
to not work well for most scientists. 

The requirements that authors have for such 'scientific' articles are manifold. 
Many wish to be able to author while offline (particularly on trains or planes). 
Almost all scientific papers are multi-author, and some degree of collaboration 
is required. Many scientists in the life sciences wish to author in Word because 
grant bodies and journals often produce templates as Word documents. Many 
wish to use DTp^X, because its idiomatic approach to programming documents 
is unreplicable with anything else. Fortunately, it is possible to induce Word- 
Press to accept content from many different authoring tools, including Word 
and M^X0. 

As a result, during the kblog project, we have seem many different workflows 
in use, often highly idiosyncratic in nature. These include: 

Word/Email: Many authors write using MS Word and collaborate by emailing 
files around. This method has a low barrier to entry, but requires significant 
social processes to prevent conflicting versions, particularly as the number 
of authors increases. 



Word/Dropbox: For the |taverna kblog, authors wrote in Word and collabo- 
rated with Dropboxj^] This method works reasonably well where many au- 
thors are involved; Dropbox detects conflicts, although it cannot prevent or 
merge them. 

Asciidoc/Dropbox: Used by the authors of this paper. Asciido<|^] is relatively 
simple, somewhat programmable and accessible. Unlike DTgX which can be 
induced to produce HTML with effort, asciidoc is designed to do so. 

Of these three approaches probably the Word/Dropbox combination is the 
the most generally used. 

From the readers perspective, a decision that we have made within knowl- 
edgeblog is to be "HTML-first" . The initial reasons for this were entirely practi- 
cal; supporting multiple toolsets is hard, particularly if any degree of consistency 
is to be maintained; the generation of the HTML is at least partly controlled 
by the middleware - WordPress in kblog's case. As well as enabling consistency 
of presentation, it also, potentially, allows us to add additional knowledge; it 
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makes semantic publication a possibility. However, we are aware that knowl- 
edgeblog currently scores rather badly on what we describe as the "bath-tub 
test"; while exporting to PDF or printing out is possible, the presentation is 
not as "neat" as would be ideal. In this regard (and we hope only in this re- 
gard), the knowlcdgeblog experience is limited. However, increasingly, readers 
are happy and capable of interacting with material on the web, without print 
outs. 

From this background and aim, we have drawn the following requirements: 

1. The author can, as much as possible, remain within familiar authoring en- 
vironments; 

2. The representation of the published work should remain extensible to, for 
instance, semantic enhancements; 

3. The author and reader should be able to have the amount of "formal" aca- 
demic publishing they need; 

4. Support for semantic publishing should be gradual and offer advantages for 
author and reader at all stages. 

We describe how we have achieved this with three exemplars, two of which 
are relatively general in use, and one more specific to biology. In each case, we 
have taken a slightly different approach, but have fulfilled our primary aim of 
making life better for machine, author and reader. 

3 Representing Mathematics 

The representation of mathematics is a common need in academic literature. 
Mathematical notation has grown from a requirement for a syntax which is highly 
expressive and relatively easy to write. It presents specific challenges because of 
its complexity, the difficulty of authoring and the difficulty of rendering, away 
from the chalk board that is its natural home. 

Support for mathematics has had a significant impact on academic pub- 
lishing. It was, for example, the original motivation behind the development of 
TpjX Qf, and it still one of the main reasons why authors wish to use it or its 
derivatives. This is to such an extent that much mathematics rendering on the 
web is driven by a TgX engine somewhere in the process. So MediaWiki (and 
therefore Wikipedia), Drupal and, of course, WordPress follow this route. The 
latter provides plugin support for Tj5]X markup using the wp-latex plugin |10j . 
Within kblog, we have developed a new plugin called mathj ax-latex |llj . From 
the kblog author's perspective these two offer a similar interface - differences 
are, therefore, described later. 

Authors write their mathematics directly as Tf^X using one of the four markup 
syntaxes. The most explicit (and therefore least likely to happen accidentally) 
is through the use of "shortcodes" These are a HTML-like markup originating 
from some forum/bulletin board systems. In this form an equation would be 
entered as [latex] e=mc~2 [/latex] , which would be rendered as "e = mc 2 ". It 
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is also possible to use three other syntaxes which are closer to math-mode in 
T^X: $$e=mc~2$$, $latex e=mc~2$, or \ [e=mc~2\] . 

From the authorial perspective, we have added significant value, as it is 
possible to use a variety of syntaxes, which are independent of the authoring 
engine. For example, a TgX-loving mathematician working with a Word-using 
biologist can still set their equations using TjtX syntax; although Word will not 
render these at authoring time but, in practice, this causes few problems for 
such authors, who are experienced at reading TJtX. Within an P/TjrX workflow 
equations will be renderable both locally with source compiled to PDF, and 
published to WordPress. 

There is also a W3C recommendation, MathML for the representation and 
presentation of mathematics. The kblog environment also supports this. In this 
case, the equivalent source appears as follows: 

<math> 

<mrow> 

<mi>E</mi> 
<mo>=</mo> 
<mrow> 

<mi>m</mi> 
<msup> 

<mi>c</mi> 
<mn>2</mn> 
</msup> 
</mrow> 
</mrow> 
</math> 

One problem with the MathML representation is obvious: it is very long- 
winded. A second issue, however, is that it is hard to integrate with existing 
workflows; most of the publication workflows we have seen in use will on recog- 
nising an angle bracket turn it into the equivalent HTML entity For some work- 
flows (L^TjtX, asciidoc) it is possible, although not easy, to prevent this within 
the native syntax. 

It is also possible to convert from Word's native OMML ( "equation editor" ) 
XML representation to MathML, although this does not integrate with Word's 
native blog publication workflow. Ironically, it is because MathML shares an 
XML based syntax with the final presentation format (HTML) that the prob- 
lem arises. The shortcode syntax, for example, passes straight-through most of 
the publication frameworks to be consumed by the middleware. From a prag- 
matic point of view, therefore, supporting shortcodes and TgX-like syntaxes has 
considerable advantages. 

For the reader, the use of mathj ax-latex has significant advantages. The de- 
fault mechanism within WordPress uses a math-mode like syntax $latex e=mc~2$ 
This is rendered using a TfrX engine into an image which is then incorporated 
and linked using normal HTML capabilities. This representation is opaque and 
non-semantic; it has significant limitations for the reader. The images are not 



scalable - zooming in cases severe pixalation; the background to the mathematics 
is coloured inside the image, so does not necessarily reflect the local style. 

Kblog, however, uses the MathJax library [T^]. this has a number of significant 
advantages for the reader. First, where the browser supports them, MathJax uses 
wcbfonts to render the images; these are scalable, attractive and standardized. 
Where they are not available, MathJax can fall-back to bitmapped fonts. The 
reader can also access additional functionality: clicking on an equation will raise 
a zoomed in popup; while the context menu allows access to a textual represen- 
tation either as TgX or MathML irrespective of the form that the author used. 
This can be cut-and-paste for further use. Kblog uses the MathJax library 12! 
to render the underlying TJ^X directly on the client. 

Our use of MathJax provides no significant disadvantages to the middleware 
layers. It is implemented in JavaScript and runs in most environments. Although, 
the library is fairly large (>100Mb), but is available on a CDN so need not stress 
server storage space. Most of this space comes from the bit-mapped fonts which 
are only downloaded on-demand, so should not stress web clients either. It also 
obviates the need for a installation which wp-latex may require (although 
this plugin can use an external server also). 

At face value, mathj ax-latex necessarily adds very little semantics to the 
maths embedded within documents. The maths could be represented as $$E=mc~2$$, 
\(E=mc~2\)] or 

<math> <mrow> <mi>E</mi> <mo>=</mo> <mrow> <mi>m</mi> 
<msup> <mi>c</mi><mn>2</mn> </msup> 
</mrow> </mrow> </math> 

So, we have a heterogenous representation for identical knowledge. However, 
in practice, the situation is much better than this. The author of the work created 
these equations and has then read them, transformed by MathJax into a rendered 
form. If MathJax has failed to translate them correctly, in line with the author's 
intention, or if it has had some implications for the text in addition to setting 
the intended equations (if the TgX style markup appears accidentally elsewhere 
in the document), the author is likely to have seen this and fixed the problem. 
Someone wishing, for example, to extract all the mathematics as MathML from 
these documents computationally, therefore, knows: 

— that the document contains maths as it imports MathJax 

— that MathJax is capable of identifying this maths correctly 

— that equations can be transformed to MathML using MatnJasp°| 

So, while our publication environment does not result directly in lower level 
of semantic heterogeneity, it does provide the data and the tools to enable the 
computational agent to make this transformation. While this is imperfect, it 
should help a bit. In short, we provide a practical mechanism to identify text 
containing mathematics and a mechanism to transform this to a single, stan- 
dardised representation. 

10 This is assuming MathJax works correctly in general. The authors and readers are 
checking the rendered representation. It is possible that an equation would render 
correctly on screen, but be rendered to MathML inaccurately 



4 Representing References 



Unlike mathematics, there is no standard mechanism for reference and in-text 
citation, but there are a large number of tools for authors such as BibTeX, 
Mendeley [13] or EndNote. As a result of this, the integration with existing 
toolsets is of primary importance, while the representation of the in-text citations 
is not, as it should be handled by the tool layer anyway. 

Within kblog, we have developed a plugin called kcitej^For the author, ci- 
tations are inserted using the syntax: 
[cite] 10. 1371/ journal. pone. 0012258 [/cite] . 

The identifier used here is a DOI, or digital object identifier and, is widely 
used within the publishing and library industry. Currently, kcite supports DOIs 
minted by either CrossRer^jor DataCit^] (in practice, this means that we sup- 
port the majority of DOIeTT We also support identifiers from PubMecj^] which 
covers most biomedical publications and arXivj^j the physics (and other do- 
mains!) preprints archive, and we now have a system to support arbitrary URLs. 
Currently, authors are required to select the identifier where it is not a DOI. 

We have picked this "shortcode" format for similar reasons as described for 
maths; it is relatively unambiguous, it is not XML based, so passes through 
the HTML generation layer of most authoring tools unchanged and is explicitly 
supported in WordPress, bypassing the need for regular expressions and later 
parsing. It would, however, be a little unwieldy from the perspective of the 
author. In practice, however, it is relatively easy to integrate this with many 
reference managers. For example, tools such as Zotero [TJ and Mendeley use 
the Citation Style Language, and so can output kcite compliant citations with 
the following slightly elided code: 

<citation> 

<layout pref ix=" [cite] " suf f ix=" [/cite] " 
delimiter=" [/cite] [cite] "> 
<text variable="D0I"/> 
</layout> 
</citation> 

We do not yet support F/IpX/BibTeX citations, although we see no rea- 
son why a similar style file should not be supported. We do, however, support 
BibTeX- formatted files: the first author's preferred editing/citation environment 
is based around these with Emacs, RefTeX, and asciidoc. While this is undoubt- 
edly a rather niche authoring environment, the (slightly elided) code for support- 
ing this demonstrates the relative ease with which tool chains can be induced to 
support kcite: 

11 http : //wordpress . org/extend/plugins/kcite/ 

12 http : //wordpress . org/extend/plugins/kcite/ 

13 http://www.datacite.org/ 

14 RStp : //www .ncbi .nlm.nih. gov/pubmed/ 

15 http : / / arxiv . org/ 



(defadvice ref tex-f ormat-citation (around phil-asciidoc-around activate) 
(if phil-ref t ex-citation-override 

(setq ad-return-value (phil-ref tex-f ormat-citation entry format)) 
ad-do-it) ) 

(defun phil-ref tex-f ormat-citation ( entry format ) 
(let ((doi (ref tex-get-bib-f ield "doi" entry))) 

(format "pass: [[cite source='doi ' \\] °/ s [/cite\\] ] " doi))) 

The key decision with kcite from the authorial perspective is to ignore the 
reference list itself and focus only on in-text citations, using public identifiers 
to references. This simplifies the tool integration process enormously, as this 
is the only data that needs to pass from the author's bibliographic database 
onward. The key advantage for authors here is two-fold: they are not required to 
populate their reference metadata for themselves, and this metadata will update 
if it changes. Secondly, the identifiers are checked; if they are wrong, the authors 
will see this straightforwardly as the entire reference will be wrong. Adding DOls 
or other identifiers moves from becoming a burden for the author to becoming 
a specific advantage. 

While supporting multiple forms of reference identifier (CrossRef DOI, Dat- 
aCite DOI, arXiv and PubMed ID) provides a clear advantage to the author, it 
comes at considerable cost. While it is possible to get metadata about papers 
from all of these sources, there is little commonality between them. Moreover, 
resolving this metadata requires one outgoing HTTP requeslp] per reference, 
which browser security might or might not allow. 

So, while the presentation of mathematics is performed largely on the client, 
for reference lists the kcite plugin performs metadata resolution and data inte- 
gration on the server. A caching functionality is provided, storing this metadata 
in the WordPress database. The bibliographic metadata is finally transferred to 
the client encoded as JSON, using asynchronous call-backs to the server. 

Finally, this JSON is rendered using the citeproc-js library on the client. In 
our experience, this performs well, adding to the readers' experience; in-text 
citations are initially shown as hyperlinks; rendering is rapid, even on aging 
hardware, and finally in-text citations are linked both to the bibliography and 
directly through to the external source. Currently, the format of the reference 
list is fixed, however, citeproc-js is a generalised reference processor, driven using 
CSIp^l This makes it straight-forward to change citation format, at the option 
of the reader, rather than the author or publisher. Both the in-text citation 
and bibliography support outgoing links direct to the underlying resource^] 
As these links have been used to gather metadata, they are likely to be correct. 
While these advantages are relatively small currently, we believe that the use of 
JavaScript rendering over a linked references can be used to add further reader 
value in future. 

16 In practice, it is often more; DOI requests, for instance, use 303 redirects. 

17 http://citationstyles.org/] 

18 Where the identifier allows - PubMed IDs redirect to PubMed. 



For the computational agent wishing to consume bibliographic information, 
we have added significant value compared to the pre-formatted HTML reference 
list. First, all the information required to render the citation is present in the 
in-text citation next to the text that the authors intended. A computational 
agent can, therefore, ignore the bibliography list itself entirely. These primary 
identifiers are, again, likely to be correct because the authors now need them to 
be correct for their own benefit. 

Should the computational agent wish, the (denormalised) bibliographic data 
used to render the bibliography is actually available, present in the underlying 
HTML as a JSON string. This is represented in a homogeneous format, although, 
of course, represents our (kcite's) interpretation of the primary data. 

A final, and subtle, advantage of kcite is that the authors can only use public 
metadata, and not their own. If they use the correct primary identifier, and 
still get an incorrect reference, it follows that the public metadata must be 
incorrec1p^| Authors and readers therefore must ask the metadata providers to 
fix their metadata to the benefit of all. This form of data linking, therefore, can 
even help those who are not using it. 

4.1 Microarray Data 

Many publications require that papers discussing microarray experiments lodge 
their data in a publically available resource such as Array Express [T3] . Authors 
do this placing an Array Express identifier which has the form E-MEXP-1551. 
Currently, adding this identifier to a publication, as with adding the raw data 
to the repository is no direct advantage to the author, other than fulfilment of 
the publication requirement. Similarly, there is no existing support within most 
authoring environments for adding this form of reference. 

For the knowledgeblog-arrayexpress pluginj^jtherefore, we have again used a 
shortcode representation, but allowed the author to automatically fill metadata, 
direct from ArrayExpress. So a tag such as: 

[aexp id="E-MEXP-1551"] species [/aexp] 

will be replaced with Saccharomyces cerevisiae, while: 

[aexp id="E-MEXP-1551"] releasedate [/aexp] 

will be replaced by "2010-02-24". While the advantage here is small, it is signifi- 
cant. Hyperlinks to ArrayExpress are automatic, authors no longer need to look 
up detailed metadata. For metadata which authors are likely to know anyway 
(such as Species), the automatic lookup operates as a check that their Array- 
Express ID is correct. As with references (see Section [6]), the use of an identifier 
becomes an advantage rather than a burden to the authors. 

Currently, for the reader there is less significant advantage at the moment. 
While there is some value to the author of the added correctness stemming from 
the ArrayExpress identifier. However, knowledgeblog-arrayexpress is currently 
under-developed, and the added semantics that is now present could be used 
more extensively. The unambiguous knowledge that: 

19 Or, we acknowledge, that kcite is broken! 

20 http : / /knowledgeblog . org/knowledgeblog-arrayexpress 



[aexp id="E-MEXP-1551"] species [/aexp] 

represents a species would allow us, for example, to link to the NCBI taxonomy 
database!^] 

Likewise, advantage for the computational agent from knowledgeblog-array- 
express is currently limited; the identifiers are clearly marked up, and as the 
authors now care about them, they are likely to be correct. Again, however, 
knowledgeblog-arrayexpress is currently under developed for the computational 
agent. The knowledge that is extracted from Array Express could be presented 
within the HTML generated by knowledgeblog-arrayexpress, whether or not it 
is displayed to the reader for, essentially no cost. By having an underlying short- 
code representation, if we choose to add this functionality to knowledgeblog- 
arrayexpress, any posts written using it would automatically update their HTML. 
For the text-mining bioinformatician, even the ability to unambiguously deter- 
mine that a paper described or used a data set relating to a specific species using 
standardised nomenclatur J^l would be a considerable boon. 

5 Discussion 

Our approach to semantic enrichment of articles is a measured and evolutionary 
approach. We are investigating how we can increase the amount of knowledge in 
academic articles presented in a computationally accessible form. However, we 
are doing so in an environment which does not require all the different aspects 
of authoring and publishing to be over-turned. More over, we have followed a 
strong principle of semantic enhancement which offers advantages to both reader 
and author immediately. So, adding references as a DOI, or other identifier, 
'automagically' produces an in text citation and a nicely formatted reference list: 
that the reference list is no longer present in the article, but is a visualisation 
over linked data; that the article itself has become a first class citizen of this 
linked data environment is a happy by-product. 

This approach, however, also has disadvantages. There are a number of se- 
mantic enhancements which we could make straight-forwardly to the knowledge- 
blog environment that we have not; the principles that we have adopted requires 
significant compromise. We offer here two examples. 

First, there has been significant work by others on CiTO [16] - an ontol- 
ogy which helps to describe the relationship between the citations and a paper. 
Kcite lays the ground-work for an easy and straight-forward addition of CiTO 
tags surrounding each in-text citation. Doing so, would enable increased ma- 
chine understandability of a reference list. Potentially, we could use this to the 
advantage to the reader also: we could distinguish between reviews and primary 
research papers; highlight the authors' previous work; emphasise older papers 
which are being refuted. However, to do this requires additional semantics from 
the author. Although these CiTO semantic enhancements would be easy to insert 
directly using the shortcode syntax, most authors will want to use their existing 

21 http : // www . ncbi . nlm . nih . gov/Taxonomy/ 

22 the standard nomenclature was only invented in 1753 and is still not used universally. 



reference manager which will not support this form of semantics; even if it does, 
the author themselves gain little advantage from adding these semantics. There 
are advantages for the reader, but in this case not for both author and reader. 
As a result, we will probably add such support to kcite; but, if we are honest, 
find it unlikely that when acting as content authors, we will find the time to add 
this additional semantics. 

Second, our presentation of mathematics could be modified to automatically 
generate MathML from any included Tj^X markup. The transformation could 
be performed on the server, using MathJax; MathML would still be rendered 
on the client to webfonts. This would mean that any embedded maths would 
be discoverable because of the existence of MathML, which is a considerable 
advantage. However, neither the reader nor the author gain any advantage from 
doing this, while paying the cost of the slower load times and higher server load 
that would result from running JavaScript on the server. More over, they would 
pay this cost regardless of whether their content were actually being consumed 
computationally. As the situation now stands, the computational user needs to 
identify the insert of MathJax into the web page, and then transform the page 
using this library, none of which is standard. This is clearly a serious compromise, 
but we feel a necessary one. 

Our support for microarrays offers the possibility of the most specific and 
increased level of semantics of all of our plugins. Knowledge about a species or 
a microarray experimental design can be precisely represented. However, almost 
by definition, this form of knowledge is fairly niche and only likely to be of 
relevance to a small community. However, we do note that the knowledgeblog 
process based around commodity technology does offer a publishing process that 
can be adapted, extended and specialised in this way relatively easily. Ultimately 
the many small communities that make up the long-tail of scientific publishing 
adds up to one large one. 

6 Conclusion 

Semantic publishing is a desirable goal, but goals need to be realistic and achiev- 
able, to move towards semantic publishing in kblog, we have tried to put in place 
an approach that gives benefit to readers, authors and computational interpre- 
tation. As a result, at this stage, we have light semantic publishing, but with 
small, but definite benefits for all. 

Semantics give meaning to entities. In kblog, we have sought benefit by "say- 
ing" within the kblog environment that entity x is either maths, a citation or a 
microarray data entity reference. This is sufficient for the kblog infra-structure 
to "know what to do" with the entity in question. Knowing that some publish- 
able entity is a "lump" of maths tells the infra-structure how to handle that 
entity: the reader has benefit from it looking like maths; the author has benefit 
by not having to do very much; and the infra-structure knows what to do. In 
addition, this approach leaves in hooks for doing more later. 

It is not necessarily easy to find compelling examples that give advantages 
for all steps. Adding in CiTO attributes to citations, for instance, has obvious 



advantages for the reader, but not the author. However, advantages may be 
indirect; richer reader semantics may give more readers and thus more citations — 
the thing authors appreciate as much as the act of publishing itself. It is, however, 
difficult to imagine how such advantages can be conveyed to the author at the 
point of writing. It is easy to see the advantages of semantic publishing for 
readers, as a community we need to pay attention to advantages to the authors. 
Without these "carrots", we will only have "sticks" and authors, particularly 
technically skilled ones, are highly adept at working around sticks. 
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